CHRISTOPHER

ASSOCIATE DATA SCIENTIST

RUCKER


- LOADDATA -


In [17]:
pwd


Out[17]:
u'c:\\users\\crucker'

In [38]:
import pandas as pd

df = pd.read_csv('data.csv')

In [21]:
df.tail()


Out[21]:
EQP_LOCAL_EQP EQP_MODEL_EQP EQP_FAIL_CNT_EQP
390956 8383922210184037 AHTC8717 0
390957 8383921490228753 AHTC8717 0
390958 8383911670068913 IPW9001 0
390959 8383610050121223 THOMDTA 0
390960 8383940050691762 IPW9001 0

- CLEANDATA -


In [39]:
modelRatings = df.pivot_table(index=['EQP_LOCAL_EQP'],columns=['EQP_MODEL_EQP'],values='EQP_FAIL_CNT_EQP').iloc[:, 1:10]
modelRatings.head()


Out[39]:
EQP_MODEL_EQP 0C810 6780 6783 8550 8580 8600 ACJA15 ACJA18 ACJA30
EQP_LOCAL_EQP
8383100010237202 NaN NaN NaN NaN NaN NaN NaN NaN NaN
8383100010250056 NaN NaN NaN NaN NaN NaN NaN NaN NaN
8383100010253217 NaN NaN NaN NaN NaN NaN NaN NaN NaN
8383100010254959 NaN NaN NaN NaN NaN NaN NaN NaN NaN
8383100010292686 NaN NaN NaN NaN NaN NaN NaN NaN NaN

- SUBRATINGS -


In [25]:
MCARD9060Ratings = modelRatings['MCARD9060']
MCARD9060Ratings.head()


Out[25]:
EQP_LOCAL_EQP
8383100010237202   NaN
8383100010250056   NaN
8383100010253217     0
8383100010254959   NaN
8383100010292686     0
Name: MCARD9060, dtype: float64

- PAIRWISE -


In [32]:
similarModels = modelRatings.corrwith(MCARD9060Ratings)
similarModels = similarModels.dropna()
df = pd.DataFrame(similarModels)
df.head(20)


Out[32]:
0
EQP_MODEL_EQP
AHCGNVM -0.001791
AHDPC3825 -0.007982
AHTC8717 -0.000686
DCX700MR -0.004071
DTDC60XU -0.004228
DTDCI401N -0.000946
IPW9001 -0.000792
MCARD9060 1.000000
MCARD9062 0.043355
MPACUDTA -0.000827
PCXG1 0.024862
TCDA92000 -0.000520
THOMDCI -0.002370
THOMDTA -0.000872
WDBABT500 -0.009676

- SCORE -


In [33]:
similarModels.sort_values(ascending=False).head(20)


Out[33]:
EQP_MODEL_EQP
MCARD9060    1.000000
MCARD9062    0.043355
PCXG1        0.024862
TCDA92000   -0.000520
AHTC8717    -0.000686
IPW9001     -0.000792
MPACUDTA    -0.000827
THOMDTA     -0.000872
DTDCI401N   -0.000946
AHCGNVM     -0.001791
THOMDCI     -0.002370
DCX700MR    -0.004071
DTDC60XU    -0.004228
AHDPC3825   -0.007982
WDBABT500   -0.009676
dtype: float64

- WRITEUP -

Forecast equipment failure ranked by pairwise correlation using criteria such as model number, subscriber number, and failure count. A dataset of ~400K tuple using model AHTC8717 correlated with model MCARD9060 are the covariants. The algorithm found 6 out of the top 10 equipment failure issues for 60% accuracy